Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to change error constructor #445

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

mysteriouslyseeing
Copy link

Closes #444.

@mysteriouslyseeing mysteriouslyseeing changed the title Added initial functionality to error callback Add option to change error constructor Dec 2, 2024
@mysteriouslyseeing
Copy link
Author

I'm not sure error_callback is the right name for the attribute. Would error_constructor be more clear?

@mysteriouslyseeing mysteriouslyseeing marked this pull request as ready for review December 2, 2024 14:34
@jeertmans
Copy link
Collaborator

Hi @mysteriouslyseeing, thanks for your PR!

What about using a similar syntax to skip:

#[logo(error = <ErrorType>)]`
#[logo(error(<ErrorType> [, callback]))]

?

E.g., with the new syntax:

#[logo(error(Error, callback))]

`#[logos(error = SomeType)]`
`#[logos(error_callback = callback)]`

is now expressed as

`#[logos(error(SomeType, callback))]`
@@ -34,7 +34,8 @@ The type `ErrorType` can be any type that implements `Clone`, `PartialEq`,
`Default` and `From<E>` for each callback's error type.

`ErrorType` must implement the `Default` trait because invalid tokens, i.e.,
literals that do not match any variant, will produce `Err(ErrorType::default())`.
literals that do not match any variant, will produce `Err(ErrorType::default())`,
unless you provide a callback with the alternate syntax `#[logos(error(ErrorType, callback = ...))]`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. Just one last comment: can you add a bit more details about the callback (signature) and maybe link to the example you wrote? Otherwise, users will not know it is documented :-)

Otherwise, it's all good to be merged!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually just changed the custom_error example to use the new error syntax instead of adding a new one, and the example is already linked in the book

@mysteriouslyseeing
Copy link
Author

Just a heads-up that adding a defaulted method to a trait (which this PR does) is listed as a 'possibly-breaking' change in the cargo book's SemVer section.

@jeertmans
Copy link
Collaborator

Just a heads-up that adding a defaulted method to a trait (which this PR does) is listed as a 'possibly-breaking' change in the cargo book's SemVer section.

Good point, meaning that it should be released under a version that either bumps major or minor part.

@jeertmans
Copy link
Collaborator

According to CI's benchmarks, this PR results in a notable decrease in performance, but I don't really see why: https://github.com/maciejhirsz/logos/actions/runs/12214084915?pr=445.

@mysteriouslyseeing
Copy link
Author

I ran the benchmarks on my machine and got this:

group                                         default_before                         default_changes
-----                                         --------------                         ---------------
count_ok/identifiers                          1.19    618.1±2.13ns  1201.8 MB/sec    1.00    519.2±2.93ns  1431.0 MB/sec
count_ok/keywords_operators_and_punctators    1.13  1711.9±11.62ns  1187.2 MB/sec    1.00  1511.9±26.52ns  1344.2 MB/sec
count_ok/strings                              1.00    409.8±1.47ns  2027.0 MB/sec    1.38    565.8±3.83ns  1468.1 MB/sec
iterate/identifiers                           1.14   590.5±11.93ns  1258.2 MB/sec    1.00    516.1±3.76ns  1439.5 MB/sec
iterate/keywords_operators_and_punctators     1.07  1612.6±16.14ns  1260.3 MB/sec    1.00  1501.0±16.17ns  1354.0 MB/sec
iterate/strings                               1.00    410.9±2.81ns  2021.8 MB/sec    1.38    567.5±4.36ns  1463.6 MB/sec

Looks like both strings tests are definitely slower given both the benchmarks. I've got no idea why.

@jeertmans
Copy link
Collaborator

I ran the benchmarks on my machine and got this:

group                                         default_before                         default_changes
-----                                         --------------                         ---------------
count_ok/identifiers                          1.19    618.1±2.13ns  1201.8 MB/sec    1.00    519.2±2.93ns  1431.0 MB/sec
count_ok/keywords_operators_and_punctators    1.13  1711.9±11.62ns  1187.2 MB/sec    1.00  1511.9±26.52ns  1344.2 MB/sec
count_ok/strings                              1.00    409.8±1.47ns  2027.0 MB/sec    1.38    565.8±3.83ns  1468.1 MB/sec
iterate/identifiers                           1.14   590.5±11.93ns  1258.2 MB/sec    1.00    516.1±3.76ns  1439.5 MB/sec
iterate/keywords_operators_and_punctators     1.07  1612.6±16.14ns  1260.3 MB/sec    1.00  1501.0±16.17ns  1354.0 MB/sec
iterate/strings                               1.00    410.9±2.81ns  2021.8 MB/sec    1.38    567.5±4.36ns  1463.6 MB/sec

Looks like both strings tests are definitely slower given both the benchmarks. I've got no idea why.

Looks like one of the test also increased significantly, that's also weird. I send an e-mail to Logos' author to see if he can enable CodSpeed, which might provide more accurate benchmarks.

@mysteriouslyseeing
Copy link
Author

The problem is on lines 381 and 385 of src/lexer.rs:

#[cfg(not(feature = "forbid_unsafe"))]
{
    self.token = core::mem::ManuallyDrop::new(Some(Err(Token::make_error(self))));
}
#[cfg(feature = "forbid_unsafe")]
{
    self.token = Some(Err(Token::make_error(self)));
}

Previously, Token::make_error(self) was Token::Error::default().

Maybe the problem is that the compiler can't optimise that like it used to? Setting self.token and creating the error both require references to self now - but surely it should be able to see that the reference isn't actually used?

@mysteriouslyseeing
Copy link
Author

Here are my benchmarks after reverting that specific line:

group                                         default_before                         default_changes
-----                                         --------------                         ---------------
count_ok/identifiers                          1.00    619.9±5.36ns  1198.5 MB/sec    1.00    618.1±1.59ns  1201.8 MB/sec
count_ok/keywords_operators_and_punctators    1.00  1709.8±10.51ns  1188.6 MB/sec    1.00  1714.4±13.83ns  1185.4 MB/sec
count_ok/strings                              1.00    410.1±0.73ns  2025.7 MB/sec    1.00    409.9±0.89ns  2026.7 MB/sec
iterate/identifiers                           1.01    591.2±4.63ns  1256.5 MB/sec    1.00    587.3±1.76ns  1264.9 MB/sec
iterate/keywords_operators_and_punctators     1.00  1620.6±21.71ns  1254.1 MB/sec    1.00  1613.9±17.77ns  1259.2 MB/sec
iterate/strings                               1.00    409.9±1.82ns  2026.3 MB/sec    1.00   409.2±10.04ns  2030.1 MB/sec

Copy link
Collaborator

@jeertmans jeertmans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your analysis @mysteriouslyseeing! Let's see if hinting the compiler that it should inline the default implementation can help, see my suggestion.

src/lib.rs Outdated
Comment on lines 82 to 84
fn make_error(_lexer: &mut Lexer<'source, Self>) -> Self::Error {
Self::Error::default()
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn make_error(_lexer: &mut Lexer<'source, Self>) -> Self::Error {
Self::Error::default()
}
#[inline(always)]
fn make_error(_lexer: &mut Lexer<'source, Self>) -> Self::Error {
Self::Error::default()
}

We can probably try this to see if that helps the compiler.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance is unchanged, unfortunately:

group                                         default_before                         default_changes
-----                                         --------------                         ---------------
count_ok/identifiers                          1.20   620.0±18.12ns  1198.2 MB/sec    1.00   518.1±28.66ns  1433.8 MB/sec
count_ok/keywords_operators_and_punctators    1.12  1731.6±95.53ns  1173.6 MB/sec    1.00  1552.2±150.95ns  1309.3 MB/sec
count_ok/strings                              1.00    410.6±6.04ns  2023.2 MB/sec    1.40   574.0±12.94ns  1447.1 MB/sec
iterate/identifiers                           1.15   594.5±30.62ns  1249.7 MB/sec    1.00   518.0±27.87ns  1434.3 MB/sec
iterate/keywords_operators_and_punctators     1.04  1623.9±49.04ns  1251.5 MB/sec    1.00  1559.1±111.50ns  1303.5 MB/sec
iterate/strings                               1.00   412.3±32.70ns  2014.8 MB/sec    1.39   571.8±16.91ns  1452.7 MB/sec

@mysteriouslyseeing
Copy link
Author

To get it to work, I changed the signature and default implementation of make_error:

    #[inline(always)]
    #[doc(hidden)]
    fn make_error(lexer: &mut Lexer<'source, Self>) {
        use internal::LexerInternal as _;
        lexer.set_error(Self::Error::default())
    }

This uses a new function on LexerInternal:

    #[inline]
    fn set_error(&mut self, error: Token::Error) {
        #[cfg(not(feature = "forbid_unsafe"))]
        {
            self.token = core::mem::ManuallyDrop::new(Some(Err(error)));
        }
        #[cfg(feature = "forbid_unsafe")]
        {
            self.token = Some(Err(error));
        }
    }

Now the error function on LexerInternal looks like this:

    #[inline]
    fn error(&mut self) {
        self.token_end = self.source.find_boundary(self.token_end);
        Token::make_error(self);
    }

Mixing around the functions like this does make it harder to understand but it does fix the performance regression:

group                                         default_before                         default_changes
-----                                         --------------                         ---------------
count_ok/identifiers                          1.00   620.0±18.12ns  1198.2 MB/sec    1.00    617.1±1.24ns  1203.8 MB/sec
count_ok/keywords_operators_and_punctators    1.03  1731.6±95.53ns  1173.6 MB/sec    1.00   1681.4±9.95ns  1208.7 MB/sec
count_ok/strings                              1.00    410.6±6.04ns  2023.2 MB/sec    1.00    409.9±0.73ns  2026.3 MB/sec
iterate/identifiers                           1.02   594.5±30.62ns  1249.7 MB/sec    1.00    583.4±1.41ns  1273.4 MB/sec
iterate/keywords_operators_and_punctators     1.00  1623.9±49.04ns  1251.5 MB/sec    1.00  1620.8±22.91ns  1253.9 MB/sec
iterate/strings                               1.01   412.3±32.70ns  2014.8 MB/sec    1.00    407.5±2.00ns  2038.4 MB/sec

And here is the comparison between the old and new changes:

group                                         default_changes_old                     default_changes
-----                                         -------------------                     ---------------
count_ok/identifiers                          1.00   518.1±28.66ns  1433.8 MB/sec     1.19    618.0±2.73ns  1202.0 MB/sec
count_ok/keywords_operators_and_punctators    1.00  1552.2±150.95ns  1309.3 MB/sec    1.08   1681.9±9.45ns  1208.3 MB/sec
count_ok/strings                              1.40   574.0±12.94ns  1447.1 MB/sec     1.00    410.3±0.75ns  2024.5 MB/sec
iterate/identifiers                           1.00   518.0±27.87ns  1434.3 MB/sec     1.14    589.9±4.94ns  1259.3 MB/sec
iterate/keywords_operators_and_punctators     1.00  1559.1±111.50ns  1303.5 MB/sec    1.04  1627.9±33.51ns  1248.4 MB/sec
iterate/strings                               1.39   571.8±16.91ns  1452.7 MB/sec     1.00    411.7±4.02ns  2017.7 MB/sec

It would be nice to know how identifiers was faster so we can get both performance increases but that should probably happen in another PR.

@jeertmans
Copy link
Collaborator

jeertmans commented Dec 12, 2024

Thanks for your efforts @mysteriouslyseeing! I am not sure to understand your two benchmark results: the first is this PR's changes, and the second is the change between last commit and second-last commit?

@mysteriouslyseeing
Copy link
Author

Thanks for your efforts @mysteriouslyseeing! I am not sure to understand your two benchmark results: the first is this PR's changes, and the second is the change between last commit and second-last commit?

Exactly, yes

@jeertmans
Copy link
Collaborator

Hum, benchmarks results are, surprising, as they seem to both indicate performance increase and decrease: https://github.com/maciejhirsz/logos/actions/runs/12279903326/attempts/2#summary-34310133559.

@mysteriouslyseeing
Copy link
Author

mysteriouslyseeing commented Dec 12, 2024

Hmmm, that is strange. Looks like it's mostly increases? Identifiers are worse, but only without forbid_unsafe?
Unfortunately I probably can't investigate because my local benchmarks are showing little to no difference in performance.

@jeertmans
Copy link
Collaborator

Hmmm, that is strange. Looks like it's mostly increases? Identifiers are worse, but only without forbid_unsafe? Unfortunately I probably can't investigate because my local benchmarks are showing little to no difference in performance.

I'll hold this for one or two more weeks, to see if @maciejhirsz can enable CodSpeed (it should provide better instruments for precise benchmarking). After that, I will try to run benchmark locally and see if it is safe to merge :-) Sorry for the delay!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add option to change error constructor
2 participants